PanomiR: a systems biology framework for analysis of multi-pathway targeting by miRNAs

Abstract Charting microRNA (miRNA) regulation across pathways is key to characterizing their function. Yet, no method currently exists that can quantify how miRNAs regulate multiple interconnected pathways or prioritize them for their ability to regulate coordinate transcriptional programs. Existing methods primarily infer one-to-one relationships between miRNAs and pathways using differentially expressed genes. We introduce PanomiR, an in silico framework for studying the interplay of miRNAs and disease functions. PanomiR integrates gene expression, mRNA–miRNA interactions and known biological pathways to reveal coordinated multi-pathway targeting by miRNAs. PanomiR utilizes pathway-activity profiling approaches, a pathway co-expression network and network clustering algorithms to prioritize miRNAs that target broad-scale transcriptional disease phenotypes. It directly resolves differential regulation of pathways, irrespective of their differential gene expression, and captures co-activity to establish functional pathway groupings and the miRNAs that may regulate them. PanomiR uses a systems biology approach to provide broad but precise insights into miRNA-regulated functional programs. It is available at https://bioconductor.org/packages/PanomiR.

Current best practice for the transcriptomic study of miRNA regulation focuses upon miRNA-gene or one-to-one miRNApathway relationships.Widely used miRNA-pathway analysis techniques such as gene set enrichment and co-expression analysis primarily detect whether a single pathway is potentially regulated by a miRNA [4,21].Enrichment analyses evaluate the presence (overlap) of targets of a single miRNA in a single pathway, aiming to identify pathways with a higher number of targets than expected by chance [21,22,[28][29][30][31][32].Many enrichment-based tools rely on precalculated miRNA-pathway relationships and require users to predetermine their pathways of interest from the disease data [23,27,33].Alternatively, correlation methods evaluate the association of the expression of a single miRNA with a gene or a proxy value representing the activity of a pathway [4,34].Table 1 describes some of the most widely used methods for miRNA pathway analyses, their scope, implementation and approaches.A one-to-one approach to miRNA-pathway analysis does not match the regulatory landscape of miRNAs.Largescale functional processes in health and disease coordinate across pathways in multiple ways, including gene sharing, pathway co-activity, multi-pathway co-regulation and cross-talk [35][36][37][38][39][40].The shortcoming of current approaches in accounting for these complex relationships and disease-specific expression dynamics limits our ability to detect the potential of a miRNA to regulate highly specific or broadly acting gene expression programs.
We have developed a framework to address the existing limitations of miRNA-pathway analysis from a systems perspective, to uncover how functional groupings of pathways are coordinated by miRNAs to form gene expression programs.Our system, Pathway networks of miRNA Regulation (PanomiR), discovers central miRNA regulators based upon their ability to target coordinate transcriptional programs.It determines if a miRNA concurrently regulates and targets a coordinate group of disease-or functionassociated pathways, as opposed to investigating isolated miRNApathway events.PanomiR derives these multi-pathway targeting events using predefined pathways, their dysregulation in disease Table 1: Overview of standard miRNA-pathway analysis methods and PanomiR

Method/Reference
Multi-pathway targeting

Pathway coordination/interaction
Open-source software

X Precalculated
MITHrIL [24] X (via DEG and DE miRNA) X X miRTar [25] X Web portal non-functional BUFET [26] X X miTalos [27] X Web portal non-functional X Precalculated states, their relative co-activation, gene expression and annotated miRNA-mRNA interactions.PanomiR profiles the activity of pathways and identifies disease-specific differentially regulated pathways by extending established activity summarization techniques we have previously developed such as Pathprint and Pathway-Drug Network [41][42][43].PanomiR identifies network modules of dysregulated pathways, derived from the pathway co-expression network (PcXN.org), to define broad-scale differentially regulated pathway groups [41][42][43].PanomiR then determines the miRNAs that target these coordinate pathway groups using a novel statistical test and predetermined miRNA-mRNA interactions [44,45].Taken together, these steps produce broad-scale, multi-pathway and disease-specific miRNA regulatory events (Figure 1).PanomiR is available to the community as a free and open-source, userfriendly Bioconductor package.

Overview
PanomiR takes as input a user-provided gene expression data set (e.g., RNA-Sequencing (RNA-Seq)) to quantify pathway activity profiles based on annotated pathway databases such as the Molecular Signatures Database (MSigDB) (Figure 1A and B) [46].Pathway activity profiles are then compared between two conditions (e.g., cancer versus control, wild type versus knockout) to identify and prioritize differentially regulated pathways (Figure 1C).PanomiR constructs a co-activity network of differentially regulated (or disease dysregulated) pathways to determine broad-scale condition-associated groups of functions.It then deconvolves the co-activity network into coherent functional groups using reference pathway co-expression networks, leveraging our previously described pathway activity methods (Figure 1D and E) [41][42][43].PanomiR then integrates user-provided miRNA-mRNA interactions (such as predicted targets from TargetScan [44] or experimentally validated interactions from TarBase [45]) to evaluate miRNA regulatory effects on coordinate pathway groups (Figures 1F and 2).The final output of PanomiR is a ranked list of central miRNAs, together with statistical significance levels for each group of differentially regulated pathways, providing an effective means for identification of pathway groups and for key miRNA prioritization, ranking and target detection.

Capturing pathway activity dynamics
Building upon our previous methodology, Pathprint [41][42][43], PanomiR ingests a user-provided gene expression data set and calculates pathway activity scores and so captures pathway functional dynamics (Figure 1B).The scores are proxy values for the activity of genes in individual pathways, which, in turn, represent biologically meaningful functional units.By capturing gene expression levels as pathway activity scores, inherent complexity is reduced while tolerance to noise is increased when compared to gene-centric analyses [4,42,43,47].Pathway activity scores leverage the complex inter-relationships and co-activity of genes.Pathway activity scores examine biological functions in a continuum and detect biological signals where standard differential gene expression analyses fail [4,34,41,43,[47][48][49].
We capture pathway activity profiles in a two-step process: (a) we rank genes in each sample in descending order, according to their expression, i.e. the highest expressed gene gets the largest rank-score, and (b) we calculate the average squared ranks of genes that belong to a pathway as the activity score.Formally, for a pathway X with n genes, Pathway x = g x 1 , . . ., g x n , the activity score, Ac a,x , in sample a is where rank a g x i refers to the rank of gene g x i (descending order) in sample a based on expression values.We generate activity profiles for each pathway of interest in each sample.Then, pathway profiles are normalized across the input samples (Supplementary Material).PanomiR uses the canonical pathways collection from MSigDB as its pathway database reference [46].MSigDB is a carefully curated database that represents non-redundant pathways from established pathway repositories such as KEGG and Reactome [46,50,51].

Detection of differentially regulated disease-associated pathways
PanomiR compares pathway activity profiles between case and control subjects to determine functional dynamics in disease.PanomiR defines differentially regulated pathways by determining statistically significant differences in pathway activity profiles between cases and controls using linear models, implemented in the Limma package (Figure 1C) [52].In contrast to enrichment analysis, the linear modeling framework of PanomiR determines the directionality of differential regulation: it defines whether a pathway is upregulated or downregulated in disease subjects (or experimental conditions) and accounts for confounding variables such as batch, sequencing center or any other fixed effects and continuous covariates.PanomiR outputs an ordered table of differentially regulated pathways along with P-values of differential regulation, adjusted for multiple hypothesis testing using false discovery rate (FDR) [53].

Detection of groups of differentially regulated pathways via their co-expression networks
Dysregulation of an individual pathway is rarely an isolated event since pathways share activity and are often co-regulated.PanomiR accounts for co-regulation to place differentially regulated pathways into groups that represent high-level disease programs by exploiting the PCxN [43]: a reference tool that organizes and assesses the shared activity of pathways (Figure 1D).PanomiR leverages PCxN's network, generated from a curated data set of 3207 expression profiles, providing an independent platform, to query co-activity of all pathways in the MSigDB data set [43,54].
PanomiR masks PCxN to contain only the subnetwork of differentially regulated pathways that were identified from the two-group data analysis in the previous step.In the masked network, nodes represent differentially regulated pathways and edges activity-correlation of pathways.PanomiR subsequently identifies densely interconnected differentially regulated pathway subnetworks using graph clustering algorithms (Figure 1E).The default clustering algorithm of PanomiR is Louvain, but PanomiR can use other clustering methods that are available in the igraph R-package [55].The subnetworks denote clusters of highly correlated coordinate groups of differentially regulated pathways driving disease-or condition-specific functions.

miRNA prioritization within clusters of differentially regulated pathways
PanomiR exploits the concept that a coordinate group of diseaseassociated pathways has common miRNA regulators.Using annotated miRNA-mRNA interactions and an empirical statistical test (Figure 2), it analyzes clusters of differentially regulated pathways, to define central miRNAs and captures the extent to which the targets of a specific miRNA are present within a group of coordinate pathways.miRNA regulatory events are then identified in three sequential steps (Figure 2): (i) by calculating individual miRNA-pathway overlap scores, (ii) by generalizing miRNA , for a miRNA X with respect to C, an observed cluster of pathways.The cluster-targeting statistic is an average individual overlap score for each miRNA-pathway pair.Individual overlap scores (e.g., S1, S2) are functions (inverse normal) of the overlap statistic (Fisher's exact test) between the miRNA target genes and the pathway member genes (B) PanomiR generates an empirical distribution of cluster-targeting scores for a miRNA X by randomly selecting a set of pathways and recalculating the cluster-targeting score.(C) The prioritization P-value is calculated by comparing the observed targeting statistic, S x c , to the null distribution of the targeting scores of miRNA X.The P-value is used to rank the miRNAs.targeting scores to a group of pathways (i.e., a cluster of differentially regulated pathways) and (iii) by estimating the statistical significance of miRNA targeting scores using an empirical approach.The empirical statistical tests are specific to the input data set, for each miRNA and each cluster of differentially regulated pathways.
In the first step, PanomiR derives the overlap scores for individual miRNA-pathway pairs using P-values from Fisher's exact test, capturing overrepresentation of targets of a specific miRNA in each individual pathway.To make analysis disease-, condition-, tissue-or cell-type-specific, PanomiR calculates overlap scores using only the genes expressed in the input experiment.In the second step, an overall targeting score for a given cluster of pathways (Figure 2) is derived.The clusters of pathways are generated in the previous step using PCxN.Formally, for each cluster of differentially regulated pathways, C, the targeting score of a miRNA x is where Φ −1 (.) denotes the inverse of the standard normal cumulative distribution function (CDF) and P xy denotes the Fisher's exact test P-value of overlap between the targets of miRNA x and genes of pathway y.The targeting-score, S c x , is related to Stouffer's method (with equal weights) for P-value aggregation.The inverse normal CDF avoids extreme cases in which an miRNA has many targets in one pathway and only a few targets in other pathways in a cluster.
In the third step, the statistical significance of the targeting score S c x is determined in order to produce cluster-specific lists of miRNAs ranked by targeting P-values.The targeting-score does not constitute, by itself, an unbiased measure of miRNA-targeting as it might depend on the number of targets of a miRNA.To create an unbiased measure, PanomiR also derives an empiricaltargeting P-value, P(S c x ), for a score of S c x .This P-value denotes the probability of observing a larger-targeting score from a random cluster of pathways (with | C | members) than the one observed.This empirical probability is derived using a bootstrap sampling approach by selecting randomized groups of pathways and recalculating their cluster targeting score.This approach directly tackles known or unknown biases in gene annotations for miRNA targets, as have been discussed by our group [21] and others [56,57].The output P-values are then adjusted for multiple hypothesis comparison using the Benjamini-Hochberg FDR [53].
Given the computational cost of bootstrap sampling, especially to calculate small P-values, PanomiR employs a Gaussian approximation approach to estimate P(S c x ).In clusters of largeenough size (>30 pathways), S c x values follow a normal distribution according to the central limit theorem.PanomiR uses precalculated Gaussian distribution estimates from 1000 random S x values to overcome the computational costs in these cases.In the last step, miRNAs are prioritized based on P-values for targeting each cluster.PanomiR's framework was assessed in a case study of a liver cancer data set from The Cancer Genome Atlas (TCGA) with 368 primary tumor samples and 49 controls (details in Supplementary Material).

Parameter sensitivity analysis
All steps of PanomiR's framework (Figure 1) were evaluated across input parameters and algorithmic choices.The sensitivity of PanomiR's pathway activity profiling (steps 1-3) was evaluated using synthetic data derived from randomized pathways and disease groups (case/control) in comparison with biologically meaningful pathways and disease samples (details in Supplementary Material, Supplementary methods).The consistency of identified pathway clusters from PCxN was evaluated by assessing clustering algorithms and pathway input sizes and compared with Jaccard similarity analysis (steps 4-5).Sensitivity analysis of Gaussian estimated miRNA P-values of pathway groups was performed with respect to individual pathways using jackknife estimation (step 6).The sensitivity of miRNA prioritization to parameters of predicted miRNA-mRNA interactions was assessed using Jaccard similarity analysis (step 6).Supplementary material contains details of the implementation and specifications of parameter sensitivity analysis.

Comparative assessment of PanomiR and established miRNA-pathway analysis tools
PanomiR's miRNA prioritization was compared with widely used miRNA-pathway analysis tools DIANA-miRPath v4.0, MIENTUR-NET and MITHRIL (parameter details in Supplementary Material) [22,24,33].PanomiR was also directly compared against Fisher's exact test (hypergeometric test) because the latter is incorporated in the vast majority of existing tools including DIANA-miRPath, MIENTURNET, miTalos and miRPathDB v2.0 [22,23,27,33].In addition to packages and webservers, two adaptations of existing miRNA-pathway analysis approaches were implemented and assessed in the PanomiR's ecosystem: (1) miRNA hits: determining the number of pathways within a given cluster that were significantly enriched for targets of a miRNA and (2) P-value aggregation: using Fisher's method and/or Stouffer's method to obtain one single P-value that combines enrichment P-values of a miRNA within all pathways in a cluster.In addition to evaluation of miRNA prioritization, PanomiR's pathway activity profiles were compared to MITHRIL's results for assessment of diseaseassociated pathways.PanomiR was compared to other tools in terms of number of miRNAs detected, biological relevance of miRNAs attributed to their respective pathway groups and bias in prioritization of miRNAs with a large number of targeted genes.

RESULTS
We present a case study of PanomiR's utility to provide a systematic, unbiased and biologically meaningful determination of regulatory miRNAs.We applied our system (Figure 1) to a liver cancer gene expression data set from The Cancer Genome Atlas (TCGA) comprising 368 primary tumor samples and 49 controls (data preprocessing detail in Supplementary Material, Supplementary methods) [58,59].PanomiR identified differentially regulated pathways and uncovered their regulating miRNAs in liver cancer.We assessed the biological relevance of the readouts and evaluated the parameter sensitivity and statistical robustness of PanomiR in silico.Our framework recapitulated known central miRNAs in hepatocellular carcinoma with a biologically meaningful assignment of pathways under their regulation, unbiased by the number of genes targeted by each miRNAs.By comparing PanomiR's results with liver cancer literature and other miRNApathway analysis tools, we demonstrate its ability to unbiasedly infer informative multi-pathway targeting events by miRNAs.

PanomiR detects multiple liver cancer-associated pathways
We generated and compared pathway activity profiles from normal tissues adjacent to tumor (TCGA abbreviation: NT, n = 49) and primary solid tumors (TCGA abbreviation: TP, n = 368) from liver cancer gene expression RNA-Seq data and using the MSigDB pathway database.PanomiR detected 428 upregulated and 397 downregulated pathways in TP compared to NT (FDR < 0.01, total pathways 1220; Tables 2 and S1).The large-scale differences in pathway activity profiles closely mirror the differential expression results at the gene level where more than 50% of the genes were differentially expressed (DE) based on a similar statistical design (FDR < 0.01; n = 7801; total genes = 14 212).
Differentially regulated pathways ref lected well-established dysregulated functions in liver cancer (Table 2).For example, NUCLEAR SIGNALING BY ERBB4 was downregulated in TP and activated in NT and has the highest statistical significance among all pathways (Figure 3A, Table 2).Downregulation of ERBB4 in tumors is in concordance with a well-established body of evidence on the roles of ERBB signaling as a tumor suppressor in liver cancer [60,61].In addition, we found downregulation of HDL-MEDIATED LIPID TRANSPORT in tumor tissues, corroborated by several reports on lipid disorders in liver cancer including decreased plasma levels of HDL [62,63].These results suggest the utility of PanomiR in detecting differentially regulated disease functions through pathway activity analysis.
We compared the pathway readouts of PanomiR with pathway enrichment analysis of DE genes from the same data set.Enrichment analysis (Fisher's exact test) identified 51 enriched pathways (FDR < 0.01, Table S2) within the DE genes (differential gene expression: FDR < 0.05; |LogFC| > 1; Supplementary material).Of these enriched pathways, 50 were also determined as differentially regulated by PanomiR.Significant overlap between the results suggests that PanomiR recapitulates the majority of enrichment analysis readouts (Fisher's exact test P-value = 3.5 × 10 −8 ).PanomiR detected liver cancer pathways that were missed by enrichment analysis (Tables 2 and S1).For example, the top liver cancer-associated pathway according to PanomiR, NUCLEAR SIGNALING BY ERBB4, was not detected by enrichment analysis (P-value = 1).Enrichment analysis (overrepresentation test) prioritizes pathways with more DE genes than expected by chance.This means that enrichment analysis misses pathways with differential activity between disease and controls subjects that do not have increased DE gene proportions.These results highlight two main advantages of PanomiR over pathway enrichment analysis: (a) the ability to detect significant functional dysregulation in disease even in absence of significant differential gene expression and (b) the ability to determine the direction of differential pathway regulation, i.e. upregulation or downregulation.

Synthetic data analysis shows PanomiR captures biologically meaningful signals
To assess the recapitulation of biological signals by PanomiR, we employed two randomization tests (Supplementary Material, Supplementary methods).First, we asked to what extent PanomiR detected differentially regulated pathways in a random assignment of samples to case and control groups in liver cancer (i.e.biologically meaningless classes).PanomiR found a very small number of differentially regulated pathways (mean = 0.054, SD = 1.2) via randomized case/control sample assignment (Table S3).This means that PanomiR is not prone to detect spurious findings (i.e., differentially regulated pathways) in absence of a biological signal.
Next, we examined if using biologically meaningful pathways (as annotated in the MSigDB) demonstrated any advantages over using randomly assigned gene sets.We generated randomized pathways by permuting gene labels to conserve the pathway overlap structure of the original MSigDB data set.We found that annotated gene sets generate a significantly larger number of differentially regulated pathways than randomized ones (onesided z-test P-value <3.34 × 10 −5 ; mean = 693.785pathways at an Table 2: Detection of differentially regulated pathways in liver cancer.Most significant differentially regulated pathways identified by PanomiR according to the P-value of differential activity between tumor (TP) and normal tissues (NT) from TCGA database.Differential regulation P-values were derived using linear models using the limma package by comparing pathway activity profiles of TP versus NT.Differential activation adjusted P-values are for multiple hypothesis testing using FDR.Direction denotes upregulation or downregulation of pathway activity in TP versus NT.Enrichment adjusted P-values for pathways are provided for comparison.Enrichment P-values were derived from the differentially expressed genes (|FC| >1, FDR <0.05).The column '#DE genes' shows the number of differentially expressed genes (TP versus NT) that are present in the pathway  S3, Figure S1).This finding shows that biologically meaningful gene sets were more likely to sensitively capture biological signals.

Identification of coordinate clusters of differentially regulated pathways
Pathways coordinate and co-regulate through various mechanisms, including shared genes.We used the PCxN-where edges represent precalculated correlations between pathways based on independent gene expression data-to detect coordinate groups of differentially regulated pathways [43].We mapped the 200 most statistically significant differentially regulated pathways onto the PCxN network.We then performed Louvain clustering to identify coordinate pathway groups among the top 200 pathways.PanomiR identified three major clusters of differentially regulated pathways (Figure 4) with consistent functions: (i) the largest cluster of differentially regulated pathways (cluster A) contained pathways upregulated in cancer such as SPLICEOSOME, PROTEASOME, TRANSLATION, RNA POLL II TRANSCRIPTION and SIGNALING BY WNT (Supplementary Table S4).Wnt signaling activation is a critical mechanism for transformation of precancerous lesions into liver cancer through proliferation [64].(ii) The second largest cluster (cluster B) contained pathways related to cell cycle and proliferation (Figure 4, Table S4).(iii) The third cluster (cluster C) contained liver cancer-associated signaling pathways that were either down or upregulated in cancer with terms related to ERBB signaling, IL signaling and NOTCH signaling (Table S4).Differentially regulated pathways within clusters A and B showed a coherent direction of differential regulation in cancer (TP) versus normal adjacent tissues (NT), suggesting coordinate multi-pathway dysregulation driving high-order disease functions.

Detection of regulatory miRNAs that target clusters of differentially regulated pathways
We evaluated whether coordinate clusters of differentially regulated pathways have common miRNA regulators.In our case study, we examined both experimentally supported (TarBase v8.0; >500 K interactions) and predicted miRNA-mRNA interactions (Targetscan v7.2; >113 K interactions) to detect miRNAs that target each cluster of differentially regulated pathways (Tables 3, 4, S5 and S6).Our results showed that PanomiR identified distinct informative miRNAs for each cluster of liver cancer-associated pathways.

Experimentally supported interactions
PanomiR detected 202, 104 and 1 miRNA regulators in clusters A, B and C, respectively (FDR < 10 −5 , Tables 3 and S5).These included known liver cancer-associated miRNAs with consistent   2).Edges represent canonical coexpression between two pathways, obtained from an independent compendium of gene expression data, derived from the PCxN method.Node colors represent unsupervised network clusters found by the Louvain algorithm.Clusters were manually labeled according to the functional consensus of their pathways.
Table 3: PanomiR prioritizes regulatory miRNAs in liver cancer using experimentally validated interactions.Prioritized miRNAs for each identified pathway cluster, ranked by PanomiR targeting P-value (Figure 2).miRNAs are prioritized based on experimentally validated miRNA-mRNA interaction from TarBase V8.0.Enrichment analysis results are provided for comparison.The column '#Pathways enriched' denotes the number of pathways in the cluster with significant (FDR < 0.25) enrichment in the targets of each miRNA, derived using Fisher's exact test   [73,74]; and miR-103a-3p is a promoter of proliferation that is highly dysregulated in liver cancer [75].In cluster C, we found miR-410-3p as a central regulator of the relevant module.This miRNA has been shown to be a circulating biomarker of distant metastasis into the lung and the liver [76,77]; it also regulates adenomas via signaling pathways such as MAPK, PTEN/AKT and STAT [78,79].In cluster C, we also found a significant targeting role for miR-552-3p, which has been associated with liver cancer and regulates various hallmarks of cancer [80].Supplementary material provides an examination of the relationship between PanomiR miRNAs with differentially expressed miRNAs in liver cancer.While we did not find a significant association between prioritization by PanomiR and differential expression, PanomiR attributes distinct DE miRNAs to distinct groups of pathwaytargeting events-providing a knowledge-driven approach for functional characterization of data-driven disease miRNAs (Tables S7-S9).Our results establish that PanomiR successfully detects key regulating liver-cancer miRNAs and their downstream differentially regulated pathways based on gene expression data.
Although PanomiR detected multiple liver cancer-associated miR-NAs from predicted interactions, the set of prioritized miRNAs was different than that of experimentally supported interactions (Tables 4 and S6).For example, PanomiR prioritized miR-299-3p in cluster C, a regulator of IL and STAT signaling pathways in liver cells, which have several associated annotated pathways in cluster C [81].Our results suggest that predicted and experimentally validated miRNA interactions databases produce complementary results, and both should be considered for the downstream analysis of transcriptomic data.

Parameter sensitivity analysis
We evaluated the sensitivity of PanomiR to input parameters and algorithmic choices.Our results show that PanomiR produces consistent results across varying input sizes, clustering algorithms and cut-offs for selecting predicted miRNA-mRNA interactions.We also show that PanomiR's estimated miRNA prioritization P-values are robust and not driven by individual pathways.

Sensitivity to the number of input pathways
We asked whether the number of significant pathways selected in the module detection step (Figure 1D) affected the organization of pathway clusters.We iteratively performed Louvain clustering on the top pathways (ranging from 150 to 450 pathways with 50 increments) and assessed the Jaccard similarity between the top three clusters (Supplementary Material).Our results showed a strong conservation of module composition across varying parameter choices with a high-level one-to-one correspondence between the top three clusters in all iterations (Figure S2).

Assessment of different clustering algorithms
We investigated whether the choice of clustering algorithm affected the pathway module composition.PanomiR supports a range of clustering algorithms using the igraph package [55].We evaluated the Jaccard similarity between the top three clusters within the top 200 differentially regulated pathways identified by Louvain, edge-betweenness, Infomap and fast-greedy clustering algorithms [82][83][84][85].The fast-greedy and Louvain algorithms distributed pathways across three clusters, while the edgebetweenness and Infomap methods distributed the pathways mainly into two clusters (Figure S3A).Our results showed a strong one-to-one correspondence between the clusters generated by fast-greedy and Louvain as well as between the clusters generated by edge-betweenness and Infomap algorithms (Figure S3B).The results suggest an overall high-level stability of pathway modules using varying algorithms.

Stability of PanomiR's miRNA prioritization P-values
We assessed the robustness and sensitivity of PanomiR's miRNA prioritization by analyzing if the P-values of miRNA targeting were driven by individual pathways.We used a jackknifing strategy to recalculate PanomiR's P-values in the clusters of differentially regulated pathways by leaving out one pathway at a time (Supplementary Material).The jackknifed P-values perfectly correlated with the original PanomiR P-values (Spearman's rho = 1, Figure S4).This result shows that PanomiR's miRNA prioritization depends on the collective targeting of pathways by each miRNA and is undriven by individual pathways.Additionally, we compared miRNA P-values estimated using the Gaussian method with those derived from bootstrapping.We found a significant correlation (Spearman's rho = 0.9997, Figure S5), which demonstrates the stability of PanomiR's estimated P-values.

Parameter sensitivity in using predicted miRNA-mRNA interactions
We assessed PanomiR's miRNA prioritization using varying parameters for the selection of predicted miRNA-mRNA interactions in the TargetScan database (details in Supplementary Material).We deployed four strategies to generate lists of predicted miRNA targets.These strategies either used a combination of conserved miRNA families along with prediction scores or solely used various cut-offs for miRNA-mRNA binding prediction scores (Table S10).Using rank-based correlation analysis and hierarchical clustering, we show that PanomiR-derived miRNA ranking for targeting the three clusters of differentially regulated pathways is consistent and positively correlated across various selections of predicted miRNA-mRNA interactions (Figure S6).

Comparison of PanomiR with existing miRNA-pathway analysis tools
We compared PanomiR with widely used miRNA-pathway analysis tools including DIANA-miRPath v4.0, MIENTURNET and MITHRIL v2.1 [22,24,33].Many state-of-the-art tools use extensions of the over-representation analysis to infer miRNApathway interactions.To our knowledge, DIANA-miRPath is the only existing tool that directly assesses multi-pathway targeting by miRNAs.DIANA-miRPath uses Fisher's aggregation on miRNApathway over-representation P-values (referred to as 'termcentric' analysis).However, in contrast to PanomiR that aims to prioritize transcriptional programs spanning multiple pathways controlled by specific miRNAs, miRPath's aggregation statistic returns whether the selected miRNA significantly targets at least one of the identified pathways [33].Other tools, including miRPathDB v2.0, can determine how many pathways are enriched in gene targets of a queried miRNA, which we refer to as 'miRNA hits'.Unlike PanomiR, most tools (including DIANA-miRPath, miRPathDB and miTalos) do not provide an objective selection of input pathways and leave the input to the users.We compared our system with other tools using both the original software (where functional and applicable) and their adaptations within PanomiR's system.For a fair comparison using the same background of genes and pathways, we assessed two adaptations of existing miRNApathway analysis approaches in the PanomiR's ecosystem: 'miRNA hits' and 'P-value aggregation'.Specifically, we extended the enrichment analysis to a group (cluster) of pathways by interrogating the number of pathways within a given cluster that were significantly enriched for targets of a miRNA (miRNA hits).This 'miRNA hits' approach is similar to the functionalities within miRPathDB v2.0 and miTalos [23,27].For example, if the targets of a miRNA are significantly enriched in five pathways within a group of pathways, the miRNA receives a targeting (hit) score of 5.For the 'P-value aggregation' approach, we used Fisher's method to obtain one single P-value that combines enrichment Pvalues of a miRNA within all pathways in a cluster.This approach has been previously implemented in DIANA-miRPath v4.0 for multi-pathway targeting.We also analyzed Stouffer's P-value aggregation method within clusters to combine miRNA-pathway enrichment P-values as an alternative of Fisher's method.

Prioritization of disease-associated miRNAs
PanomiR successfully detected liver cancer-associated miRNAs that were not prioritized by enrichment analysis (miRNA hits) and P-value aggregation approaches (Tables 3 and S5).When using experimentally supported miRNA-mRNA interactions, enrichment analysis of cluster A revealed miR-525-3p as enriched in only 1 and miR-1307-5p in none out of 65 pathways (Table 3).Fisher's aggregation-adjusted P-values for miR-525-3p and miR-1307-5p were 1.3 × 10 −9 and 9.15 × 10 −1 , respectively (Table S5).When using predicted miRNA-mRNA interactions, PanomiR detected several miRNAs that were not detected by the number hits in the enrichment analysis (Table 4).It is worth noting that the enrichment tests (Tables 3 and 4) used a relaxed threshold (FDR < 0.25) to allow for more sensitive detection.Using a conservative cut-off (e.g., FDR < 0.05) would have retained an even lower detection rate of miRNAs.The results suggest that (a) PanomiR can detect liver cancer-associated miRNAs that are not detectable by simple enrichment tests or current miRNA pathway approaches and (b) a subset of critical liver cancer miRNAs can be detected only by analyzing a group of pathways in the form of coordinated programs and not by examining individual pathways.

Bias of existing methods in prioritizing miRNAs with more targeted genes
One-to-one miRNA-pathway enrichment analyses have been shown to be biased toward detecting miRNAs with a larger number of targets [56].For the standard multi-pathway strategies discussed above, we examined the relationship of the number of targets of a miRNA with its prioritization ranking (Tables 4 and S5).The enrichment score (hits) ranking of miRNAs significantly correlated with their number of gene targets.Stouffer's and Fisher's P-value aggregation methods also showed a strong correlation between the number of miRNA targets and the Pvalue ranking in cluster A of the TCGA data.These results suggest that current commonly available approaches for multipathway targeting are biased toward prioritizing miRNAs with more targets.In comparison, PanomiR did not show a significant correlation between ranking of a miRNA (based on P-value) and the number of its targets (Figure 5), suggesting its ability to prioritize miRNAs irrespective of the number of their gene targets.
In addition to our local implementation of miRPath v4.0 (Fisher's aggregation, Figure 5), we directly compared our results with the most recent online version of miRPath v4.0 using its 'Term-centric' analysis (Supplementary Material) [33].miRPath's portal only allowed 20 input pathways at each query.Thus, we manually queried the top 20 pathways in cluster A of the TCGA liver cancer data.miRPath v4.0 identified 193 miRNA targeting the top 20 pathways in cluster A (Table S1).Concordant with our Fisher's aggregation implementation results, miRPath showed a significant bias toward prioritization of miRNAs with a higher number of gene targets (Spearman's rho = 0.48, Figure S7).

Other comparisons
We compared PanomiR with MIENTURNET (a network-based prioritization method of regulating miRNAs) by querying the list of DE genes in the online portal [22].MIENTURNET identified three significant miRNAs targeting events (FDR < 0.05) (Table S12): miR-192-5p, miR-215-5p and miR-193-5p.MIENTURNET does not directly determine pathway deregulation events and infers miRNA targeting using gene interactions networks.The online portal detected functional pathway enrichment of miR-192-5p and miR-193b-5p using the interaction network of the input list genes.The MIENTURNET's enrichment results predominantly included terms related to cell cycle, overlapping with cluster B of PanomiR.PanomiR determined miR-192-5p as a top miRNA targeting the differentially regulated pathways in cluster B. We also compared PanomiR with MITHRIL, which detects pathway dysregulation using DE genes or DE miRNAs.MITHRIL detected nine significantly deregulated KEGG pathways (FDR < 0.01), mainly related to metabolic pathways.Among these, 'Drug metabolism -cytochrome P450', 'Retinol metabolism' and 'Linoleic acid metabolism' were represented in cluster C of PanomiR (Tables S4 and S13).Our results demonstrate that when compared against existing methods, PanomiR provides unique functionality, high sensitivity and results that are not affected by commonly observed biases.

DISCUSSION
PanomiR is a framework to determine miRNA regulation of multiple coordinately regulated pathways.Most of the existing tools for miRNA-pathway analysis are focused on one-toone miRNA-pathway relationships and lack the ability to infer relationships between miRNAs and their groups of coregulated pathways.Approaches to address this problem have used standard enrichment analysis as a basis for interrogation of multi-pathway targeting.However, this approach does not take into account the interaction between pathways and their relative expression dynamics [86].Alternative gene-networkcentric miRNA-analysis approaches, such as MIENTURNET, do not directly examine the interactions of miRNAs with known pathways.In network-based tools, disease pathways are only inferred from gene-network modules [22,[87][88][89].PanomiR addresses these challenges by deconvolving gene expression into coordinate groups of differentially regulated pathways; measuring the extent to which miRNAs target these groups.Applied to a case study of liver hepatocellular carcinoma, PanomiR captured broad-scale characteristics of cancers such as dysregulation of transcription, cellular replication and signaling (Figure 3).These groups, composed of differentially regulated pathways, represented coherent higher-order functional units that recapitulated specific, yet central, disease mechanisms.
The use of pathway activity profiles is a key component of PanomiR.It sensitively detects differentially regulated pathways and provides granular definition of coordinate functional groups (Figures 2 and 3, Table 2).PanomiR detected critical known liver cancer pathways, even though there were few differentially expressed genes associated with them (Table 2).Pathway activity profiles in PanomiR determine the up-and downregulation of pathways.The activity profiles are directly comparable and translatable across different experiments, which makes it possible to leverage co-expression of pathways to detect disease-specific functional dynamics and themes across data sets, platforms and species [41].The number of enriched pathways for a miRNA significantly correlated with its number of gene targets.We also observed a significant correlation between the number of a miRNA's targets and its prioritization ranking based on (C) Stouffer's method and (D) Fisher's method for aggregation of enrichment P-value.X-axes denote the log number of gene targets of miRNAs based on experimentally validated miRNA-mRNA interactions from the TarBase database.The y-axis in (B) represents the number of significantly enriched pathways (Adjusted P-value < 0.25, Table 3).
PanomiR's multi-pathway approach provided unbiased detection of miRNA regulatory events in liver cancer that was not detectable by conventional analyses (Figure 4, Table 3).By using 113 K high-confidence predicted miRNA interactions (TargetScan) and more than 500 K experimentally supported (TarBase) miRNA targets, PanomiR discovered informative and complementary miRNA regulatory events (Tables 3 and 4).The prioritized miRNAs identified by PanomiR have distinct roles in the disease and target pathways.The results also show that PanomiR can identify key miRNAs that target groups of pathways even with only a few targeted genes within individual pathways.

CONCLUSION
PanomiR provides an advancement over the current practice of studying static, isolated miRNA-pathway interactions.It is the first systems biology framework to study multiple differentially regulated pathways, their co-activity and their regulating miRNAs.The model achieves its broad-scale inference by accounting for co-expression of pathways and diseasespecific expression dynamics to identify miRNA-regulatory events.
PanomiR provides multiple functionalities that are not available through established miRNA-pathway analysis tools.These include pathway dysregulation analysis, identification of groups of co-expressed pathways for miRNA prioritization and granular assignment of disease-associated miRNAs to specific groups of disease pathways.These features enable PanomiR to produce unbiased miRNA prioritization results, which, in turn, sensitively determine key disease miRNAs and their targeted pathways even when there are only few known gene targets or significant differentially expressed genes.
Limitations of the PanomiR approach include (1) PanomiR detects one-to-many miRNA-pathway relationships and does not provide the analysis of many-to-one relationships; (2) PanomiR's infers miRNA from gene expression data.In addition, PanomiR does not provide co-expression analysis between pathways and miRNAs.Additional cross-examination with miRNA expression data may be necessary to make the results more actionable, representing a potential area for future expansion; (3) PanomiR primarily works with bulk RNA-Seq data and is not designed for single-cell/spatial transcriptomic data sets.For users who are interested in single-cell/spatial transcriptomic analysis of miRNA-pathway relationships, we suggest adapting a pseudobulk transformation prior to using PanomiR; and (4) the complete landscape of miRNA-mRNA binding relationships is unknown.This gap can drive discrepancy in miRNA prioritization based on different background data sets of miRNA-mRNA interactions (examples in Tables 3 and 4).In order to mitigate this gap, we have made PanomiR adaptable to varying miRNA-mRNA interaction databases, pathway gene sets and gene expression data sets to facilitate user-specific study designs and research questions.PanomiR is available to the community as an opensource R/Bioconductor package.

Figure 1 .
Figure 1.PanomiR workf low.PanomiR prioritizes miRNAs that target coordinate groups of pathways.(A) Input gene expression data set and a set of annotated pathways.(B) Gene expression data are summarized into pathway activity scores.(C) Pathway activity profiles are compared between disease and control subjects to discover differentially regulated pathways.(D) Differentially regulated pathways are mapped to the canonical PCxN, where nodes denote pathways and the edges denote correlation of activity scores.(E) Within the network of differentially regulated pathways, modules of coordinate pathways are identified using graph clustering algorithms.(F) miRNAs are prioritized using annotated miRNA-mRNA interactions (known or predicted) for preferential targeting within each cluster of differentially regulated pathways.The outputs of the pipeline are individual lists of miRNAs with prioritization scores (targeting P-values) per each cluster of pathways.

Figure 2 .
Figure 2. miRNA prioritization from pathway clusters.(A) PanomiR generates an observed targeting statistic, S x c, for a miRNA X with respect to C, an observed cluster of pathways.The cluster-targeting statistic is an average individual overlap score for each miRNA-pathway pair.Individual overlap scores (e.g., S1, S2) are functions (inverse normal) of the overlap statistic (Fisher's exact test) between the miRNA target genes and the pathway member genes (B) PanomiR generates an empirical distribution of cluster-targeting scores for a miRNA X by randomly selecting a set of pathways and recalculating the cluster-targeting score.(C) The prioritization P-value is calculated by comparing the observed targeting statistic, S x c , to the null distribution of the targeting scores of miRNA X.The P-value is used to rank the miRNAs.

Figure 3 .
Figure 3. Pathway activity analysis of the Liver Hepatocellular Carcinoma data set from the Cancer Genome Atlas.(A) Detection of differentially regulated liver cancer pathways by comparison of pathway activity profiles between normal tissues adjacent to tumors (NT) and tumor primary (TP) samples from TCGA data.Boxplots show the most significant differentially regulated pathways selected based on P-values of difference between NT and TP (Table2).(B) Principal component analysis (PCA) projection of the samples based on either all genes or all pathways.Pathway summarization in PanomiR allows to analyze the activity of pathways in a continuum.PCA of pathways conserves sample groups and captures a higher variation compared to the PCA of genes.

Figure 4 .
Figure 4. PanomiR deconvolutes coordinate clusters of differentially regulated pathways in liver cancer.The network displays a pathway co-expression map of liver cancer pathways.PanomiR detected three major groups of pathways, defined by the direction of differential regulation and clusters of coexpression.The three classes are (i) activation of transcription in tumors (cluster A); (ii) activation of cellular replication (cluster B); and (iii) deactivation of specific signaling pathways (cluster C).Each node in the network represents a differentially regulated pathway (Table2).Edges represent canonical coexpression between two pathways, obtained from an independent compendium of gene expression data, derived from the PCxN method.Node colors represent unsupervised network clusters found by the Louvain algorithm.Clusters were manually labeled according to the functional consensus of their pathways.

Figure 5 .
Figure 5. Unbiased prioritization of miRNAs by PanomiR.PanomiR prioritizes miRNAs with either a small or large number of annotated targets.In contrast, enrichment-based miRNA-prioritization methods are biased toward prioritization of miRNAs with larger numbers of gene targets.The figure displays correlation analysis of miRNA-prioritization rankings with the number of gene targets in Cluster A of the liver cancer data set.Each point represents a miRNA annotated in the TarBase database.(A) Spearman correlation analysis did not find a significant association between the number of targets and the prioritization ranking of miRNAs by PanomiR (correlation −0.03).(B)The number of enriched pathways for a miRNA significantly correlated with its number of gene targets.We also observed a significant correlation between the number of a miRNA's targets and its prioritization ranking based on (C) Stouffer's method and (D) Fisher's method for aggregation of enrichment P-value.X-axes denote the log number of gene targets of miRNAs based on experimentally validated miRNA-mRNA interactions from the TarBase database.The y-axis in (B) represents the number of significantly enriched pathways (Adjusted P-value < 0.25, Table3).